Rademacher Observations, Private Data, and Boosting

نویسندگان

  • Richard Nock
  • Giorgio Patrini
  • Arik Friedman
چکیده

The minimization of the logistic loss is a popular approach to batch supervised learning. Our paper starts from the surprising observation that, when fitting linear (or kernelized) classifiers, the minimization of the logistic loss is equivalent to the minimization of an exponential rado-loss computed (i) over transformed data that we call Rademacher observations (rados), and (ii) over the same classifier as the one of the logistic loss. Thus, a classifier learnt from rados can be directly used to classify observations. We provide a learning algorithm over rados with boostingcompliant convergence rates on the logistic loss (computed over examples). Experiments on domains with up to millions of examples, backed up by theoretical arguments, display that learning over a small set of random rados can challenge the state of the art that learns over the complete set of examples. We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework. We give several algebraic, geometric and computational hardness results on reconstructing examples from rados. We also show how it is possible to craft, and efficiently learn from, rados in a differential privacy framework. Tests reveal that learning from differentially private rados can compete with learning from random rados, and hence with batch learning from examples, achieving non-trivial privacy vs accuracy tradeoffs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Games and Rademacher Observations Losses

It has recently been shown that supervised learning with the popular logistic loss is equivalent to optimizing the exponential loss over sufficient statistics about the class: Rademacher observations (rados). We first show that this unexpected equivalence can actually be generalized to other example / rado losses, with necessary and sufficient conditions for the equivalence, exemplified on four...

متن کامل

Outlier Detection by Boosting Regression Trees

A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of ...

متن کامل

Deep Boosting

j=1 hk,j |8(k, j) 2 [p]⇥[Nk], hk,j 2Hk , and the union of all such families GF,n = S |N|=n GF,N. Fix ⇢ > 0. For a fixed N, the Rademacher complexity of GF,N can be bounded as follows for any m 1: Rm(GF,N)  1 n Pp k=1 Nk Rm(Hk). Thus, the following standard margin-based Rademacher complexity bound holds (Koltchinskii & Panchenko, 2002). For any > 0, with probability at least 1 , for all g 2 GF,...

متن کامل

On Regularizing Rademacher Observation Losses

It has recently been shown that supervised learning linear classifiers with two of the most popular losses, the logistic and square loss, is equivalent to optimizing an equivalent loss over sufficient statistics about the class: Rademacher observations (rados). It has also been shown that learning over rados brings solutions to two prominent problems for which the state of the art of learning f...

متن کامل

Rademacher complexity properties 2: finite classes and margin losses

To make this meaningful to machine learning, we need to replace Ef with some form of risk. Today will discuss three choices. 1. R` where ` is Lipschitz. We covered this last time but will recap a little. 2. Rz(f) := Pr[f(X) 6= Y ]; for this we’ll use finite classes and discuss shatter coefficients and VC dimension. 3. Rγ(f) = R`γ where `γ(z) := max{0,min{z/γ+ 1, 1}} will lead to nice bounds for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015